10 research outputs found

    Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

    Get PDF
    Cumulative Citation Recommendation (CCR) is defined as: given a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. The pipeline encountered in systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), and evaluation. Filtering is only an initial step that reduces the web-scale corpus into a working set of documents more manageable for the subsequent stages. Nevertheless, this step has a large impact on the recall that can be attained maximally. This study analyzes in-depth the main factors that affect recall in the filtering stage. We investigate the impact of choices for corpus cleansing, entity profile construction, entity type, document type, and relevance grade. Because failing on recall in this first step of the pipeline cannot be repaired later on, we identify and characterize the citation-worthy documents that do not pass the filtering stage by examining their contents

    Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

    Get PDF
    Cumulative Citation Recommendation (CCR) is defined as: given a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. The pipeline encountered in systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), and evaluation. Filtering is only an initial step that reduces the web-scale corpus into a working set of documents more manageable for the subsequent stages. Nevertheless, this step has a large impact on the recall that can be at- tained maximally. This study analyzes in-depth the main factors that affect recall in the filtering stage. We investigate the impact of choices for corpus cleansing, entity profile construction, entity type, document type, and relevance grade. Because failing on recall in this first step of the pipeline cannot be repaired later on, we identify and characterize the citation-worthy documents that do not pass the filtering stage by examining their contents

    Random performance differences between online recommender system algorithms

    Get PDF
    In the evaluation of recommender systems, the quality of recommendations made by a newly proposed algorithm is compared to the state-of-the-art, using a given quality measure and dataset. Validity of the evaluation depends on the assumption that the evaluation does not exhibit artefacts resulting from the process of collecting the dataset. The main difference between online and offline evaluation is that in the online setting, the user’s response to a recommendation is only observed once. We used the NewsREEL challenge to gain a deeper understanding of the implications of this difference for making comparisons between different recommender systems. The experiments aim to quantify the expected degree of variation in performance that cannot be attributed to differences between systems. We classify and discuss the non-algorithmic causes of performance differences observed

    Cumulative Citation Recommendation: A Feature-aware Comparisons of Approaches

    Get PDF
    In this work, we conduct a feature-aware comparison of approaches to Cumulative Citation Recommendation (CCR), a task that aims to filter and rank a stream of documents according to their relevance to entities in a knowledge base. We conducted experiments starting with a big feature set, identified a powerful subset and applied it to comparing classification and learning to rank algorithms. With few set of powerful features, we achieve better performance than the state-of-the-art. Surprisingly, our findings challenge the previously known preference of learning-to-rank over classification: in our study, the CCR performance of the classification approach outperforms that using learning-to-rank. This indicates that comparing two approaches is problematic due to the interplay between the approaches themselves and the feature sets one chooses to use

    CWI at TREC 2012, KBA track and Session Track

    Get PDF
    We participated in two tracks: Knowledge Base Acceleration (KBA) Track and Session Track. In the KBA track, we focused on experi- menting with different approaches as it is the first time the track is launched. We experimented with supervised and unsupervised re- trieval models. Our supervised approach models include language models and a string-learning system. Our unsupervised approaches include using: 1)DBpedia labels and 2) Google-Cross-Lingual Dic- tionary (GCLD). While the approach that uses GCLD targets the central and relvant bins, all the rest target the central bin. The GCLD and the string-learning system have outperformed the oth- ers in their respective targeted bins. The goal of the Session track submission is to evaluate whether and how a logic framework for representing user interactions with an IR system can be used for improving the approximation of the relevant term distribution that another system that is supposed to have access to the session infor- mation will then calculate. the documents in the stream corpora. Three out of the seven runs used a Hadoop cluster provide by Sara.nl to process the stream cor- pora. The other 4 runs used a federated access to the same corpora distributed among 7 workstations

    CWI and TU Delft at TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks

    Get PDF
    This paper provides an overview of the work done at the Centrum Wiskunde & Informatica (CWI) and Delft University of Technology (TU Delft) for different tracks of TREC 2013. We participated in the Contextual Suggestion Track, the Federated Web Search Track, the Knowledge Base Acceleration (KBA) Track, and the Web Ad-hoc Track. In the Contextual Suggestion track, we focused on filtering the entire ClueWeb12 collection to generate recommendations according to the provided user profiles and contexts. For the Federated Web Search track, we exploited both categories from ODP and document relevance to merge result lists. In the KBA track, we focused on the Cumulative Citation Recommendation task where we exploited different features to two classification algorithms. For the Web track, we extended an ad-hoc baseline with a proximity model that promotes documents in which the query terms are positioned closer together

    The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Oesophageal cancer is a common and often fatal cancer that has two main histological subtypes: oesophageal squamous cell carcinoma and oesophageal adenocarcinoma. Updated statistics on the incidence and mortality of oesophageal cancer, and on the disability-adjusted life-years (DALYs) caused by the disease, can assist policy makers in allocating resources for prevention, treatment, and care of oesophageal cancer. We report the latest estimates of these statistics for 195 countries and territories between 1990 and 2017, by age, sex, and Socio-demographic Index (SDI), using data from the Global Burden of Diseases, Injuries, and Risk Factors Study 2017 (GBD). Methods We used data from vital registration systems, vital registration-samples, verbal autopsy records, and cancer registries, combined with relevant modelling, to estimate the mortality, incidence, and burden of oesophageal cancer from 1990 to 2017. Mortality-to-incidence ratios (MIRs) were estimated and fed into a Cause of Death Ensemble model (CODEm) including risk factors. MIRs were used for mortality and non-fatal modelling. Estimates of DALYs attributable to the main risk factors of oesophageal cancer available in GBD were also calculated. The proportion of oesophageal squamous cell carcinoma to all oesophageal cancers was extracted by use of publicly available data, and its variation was examined against SDI, the Healthcare Access and Quality (HAQ) Index, and available risk factors in GBD that are specific for oesophageal squamous cell carcinoma (eg, unimproved water source and indoor air pollution) and for oesophageal adenocarcinoma (gastro-oesophageal reflux disease). Findings There were 473 000 (95% uncertainty interval [95% UI] 459 000–485 000) new cases of oesophageal cancer and 436 000 (425 000–448 000) deaths due to oesophageal cancer in 2017. Age-standardised incidence was 5·9 (5·7–6·1) per 100 000 population and age-standardised mortality was 5·5 (5·3–5·6) per 100 000. Oesophageal cancer caused 9·78 million (9·53–10·03) DALYs, with an age-standardised rate of 120 (117–123) per 100 000 population. Between 1990 and 2017, age-standardised incidence decreased by 22·0% (18·6–25·2), mortality decreased by 29·0% (25·8–32·0), and DALYs decreased by 33·4% (30·4–36·1) globally. However, as a result of population growth and ageing, the total number of new cases increased by 52·3% (45·9–58·9), from 310 000 (300 000–322 000) to 473 000 (459 000–485 000); the number of deaths increased by 40·0% (34·1–46·3), from 311 000 (301 000–323 000) to 436 000 (425 000–448 000); and total DALYs increased by 27·4% (22·1–33·1), from 7·68 million (7·42–7·97) to 9·78 million (9·53–10·03). At the national level, China had the highest number of incident cases (235 000 [223 000–246 000]), deaths (213 000 [203 000–223 000]), and DALYs (4·46 million [4·25–4·69]) in 2017. The highest national-level age-standardised incidence rates in 2017 were observed in Malawi (23·0 [19·4–26·5] per 100 000 population) and Mongolia (18·5 [16·4–20·8] per 100 000). In 2017, age-standardised incidence was 2·7 times higher, mortality 2·9 times higher, and DALYs 3·0 times higher in males than in females. In 2017, a substantial proportion of oesophageal cancer DALYs were attributable to known risk factors: tobacco smoking (39·0% [35·5–42·2]), alcohol consumption (33·8% [27·3–39·9]), high BMI (19·5% [6·3–36·0]), a diet low in fruits (19·1% [4·2–34·6]), and use of chewing tobacco (7·5% [5·2–9·6]). Countries with a low SDI and HAQ Index and high levels of indoor air pollution had a higher proportion of oesophageal squamous cell carcinoma to all oesophageal cancer cases than did countries with a high SDI and HAQ Index and with low levels of indoor air pollution. Interpretation Despite reductions in age-standardised incidence and mortality rates, oesophageal cancer remains a major cause of cancer mortality and burden across the world. Oesophageal cancer is a highly fatal disease, requiring increased primary prevention efforts and, possibly, screening in some high-risk areas. Substantial variation exists in age-standardised incidence rates across regions and countries, for reasons that are unclear. Funding Bill & Melinda Gates Foundation

    The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990-2017: A systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Oesophageal cancer is a common and often fatal cancer that has two main histological subtypes: oesophageal squamous cell carcinoma and oesophageal adenocarcinoma. Updated statistics on the incidence and mortality of oesophageal cancer, and on the disability-adjusted life-years (DALYs) caused by the disease, can assist policy makers in allocating resources for prevention, treatment, and care of oesophageal cancer. We report the latest estimates of these statistics for 195 countries and territories between 1990 and 2017, by age, sex, and Socio-demographic Index (SDI), using data from the Global Burden of Diseases, Injuries, and Risk Factors Study 2017 (GBD). Methods We used data from vital registration systems, vital registration-samples, verbal autopsy records, and cancer registries, combined with relevant modelling, to estimate the mortality, incidence, and burden of oesophageal cancer from 1990 to 2017. Mortality-to-incidence ratios (MIRs) were estimated and fed into a Cause of Death Ensemble model (CODEm) including risk factors. MIRs were used for mortality and non-fatal modelling. Estimates of DALYs attributable to the main risk factors of oesophageal cancer available in GBD were also calculated. The proportion of oesophageal squamous cell carcinoma to all oesophageal cancers was extracted by use of publicly available data, and its variation was examined against SDI, the Healthcare Access and Quality (HAQ) Index, and available risk factors in GBD that are specific for oesophageal squamous cell carcinoma (eg, unimproved water source and indoor air pollution) and for oesophageal adenocarcinoma (gastro-oesophageal reflux disease). Findings There were 473 000 (95 uncertainty interval 95% UI 459 000-485 000) new cases of oesophageal cancer and 436 000 (425 000-448 000) deaths due to oesophageal cancer in 2017. Age-standardised incidence was 5.9 (5.7-6.1) per 100 000 population and age-standardised mortality was 5.5 (5.3-5.6) per 100 000. Oesophageal cancer caused 9.78 million (9.53-10.03) DALYs, with an age-standardised rate of 120 (117-123) per 100 000 population. Between 1990 and 2017, age-standardised incidence decreased by 22.0% (18.6-25.2), mortality decreased by 29.0% (25.8-32.0), and DALYs decreased by 33.4% (30.4-36.1) globally. However, as a result of population growth and ageing, the total number of new cases increased by 52.3% (45.9-58.9), from 310 000 (300 000-322 000) to 473 000 (459 000-485 000); the number of deaths increased by 40.0% (34.1-46.3), from 311 000 (301 000-323 000) to 436 000 (425 000-448 000); and total DALYs increased by 27.4% (22.1-33.1), from 7.68 million (7.42-7.97) to 9.78 million (9.53-10.03). At the national level, China had the highest number of incident cases (235 000 223 000-246 000), deaths (213 000 203 000-223 000), and DALYs (4.46 million 4.25-4.69) in 2017. The highest national-level agestandardised incidence rates in 2017 were observed in Malawi (23.0 19.4-26.5 per 100 000 population) and Mongolia (18.5 16.4-20.8 per 100 000). In 2017, age-standardised incidence was 2.7 times higher, mortality 2.9 times higher, and DALYs 3.0 times higher in males than in females. In 2017, a substantial proportion of oesophageal cancer DALYs were attributable to known risk factors: tobacco smoking (39.0% 35.5-42.2), alcohol consumption (33.8% 27.3-39.9), high BMI (19.5% 6.3-36.0), a diet low in fruits (19.1% 4.2-34.6), and use of chewing tobacco (7.5% 5.2-9.6). Countries with a low SDI and HAQ Index and high levels of indoor air pollution had a higher proportion of oesophageal squamous cell carcinoma to all oesophageal cancer cases than did countries with a high SDI and HAQ Index and with low levels of indoor air pollution. Interpretation Despite reductions in age-standardised incidence and mortality rates, oesophageal cancer remains a major cause of cancer mortality and burden across the world. Oesophageal cancer is a highly fatal disease, requiring increased primary prevention efforts and, possibly, screening in some high-risk areas. Substantial variation exists in age-standardised incidence rates across regions and countries, for reasons that are unclear. © 2020 The Author(s)

    Global injury morbidity and mortality from 1990 to 2017: Results from the global burden of disease study 2017

    No full text
    Background Past research in population health trends has shown that injuries form a substantial burden of population health loss. Regular updates to injury burden assessments are critical. We report Global Burden of Disease (GBD) 2017 Study estimates on morbidity and mortality for all injuries. methods We reviewed results for injuries from the GBD 2017 study. GBD 2017 measured injury-specific mortality and years of life lost (YLLs) using the Cause of Death Ensemble model. To measure non-fatal injuries, GBD 2017 modelled injury-specific incidence and converted this to prevalence and years lived with disability (YLDs). YLLs and YLDs were summed to calculate disability-adjusted life years (DALYs). Findings In 1990, there were 4 260 493 (4 085 700 to 4 396 138) injury deaths, which increased to 4 484 722 (4 332 010 to 4 585 554) deaths in 2017, while age-standardised mortality decreased from 1079 (1073 to 1086) to 738 (730 to 745) per 100 000. In 1990, there were 354 064 302 (95% uncertainty interval: 338 174 876 to 371 610 802) new cases of injury globally, which increased to 520 710 288 (493 430 247 to 547 988 635) new cases in 2017. During this time, age-standardised incidence decreased non-significantly from 6824 (6534 to 7147) to 6763 (6412 to 7118) per 100 000. Between 1990 and 2017, age-standardised DALYs decreased from 4947 (4655 to 5233) per 100 000 to 3267 (3058 to 3505). Interpretation Injuries are an important cause of health loss globally, though mortality has declined between 1990 and 2017. Future research in injury burden should focus on prevention in high-burden populations, improving data collection and ensuring access to medical care. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY. Published by BMJ

    Mapping 123 million neonatal, infant and child deaths between 2000 and 2017

    No full text
    Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2—to end preventable child deaths by 2030—we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000–2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations. © 2019, The Author(s)
    corecore